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ABSTRACT 

Recent improvements in positioning technology has led to 
a much wider availability of massive moving object data. 
A crucial task is to find the moving objects that travel to- 
gether. Usually, these object sets are called spatio-temporal 
patterns. Due to the emergence of many different kinds 
of spatio-temporal patterns in recent years, different ap- 
proaches have been proposed to extract them. However, 
each approach only focuses on mining a specific kind of pat- 
tern. In addition to being a painstaking task due to the 
large number of algorithms used to mine and manage pat- 
terns, it is also time consuming. Moreover, we have to exe- 
cute these algorithms again whenever new data are added to 
the existing database. To address these issues, we first re- 
define spatio-temporal patterns in the itemset context. Sec- 
ondly, we propose a unifying approach, named GeT_Move, 
which uses a frequent closed itemset-based spatio-temporal 
pattern-mining algorithm to mine and manage different 
spatio-temporal patterns. GeT_Move is implemented in two 
versions which are GeT_Move and Incremental GeT_Move. 
To optimize the efficiency and to free the parameters setting, 
we also propose a Parameter Free Incremental GeT_Move al- 
gorithm. Comprehensive experiments are performed on real 
datasets as well as large synthetic datasets to demonstrate 
the effectiveness and efficiency of our approaches. 

1. INTRODUCTION 

Nowadays, many electronic devices are used for real world 
applications. Telemetry attached on wildlife, GPS installed 
in cars, sensor networks, and mobile phones have enabled 
the tracking of almost any kind of data and has led to an 
increasingly large amount of data that contain moving ob- 
jects and numerical data. Therefore, analysis on such data 
to find interesting patterns is attracting increasing attention 
for applications such as movement pattern analysis, animal 
behavior study, route planning and vehicle control. 

Early approaches designed to recover information from 
spatio-temporal datasets included ad-hoc queries aimed as 



answering queries concerning a single predicate range or 
nearest neighbour. For instance, "finding all the moving ob- 
jects inside area A between 10:00 am and 2:00 pm" or "how 
many cars were driven between Main Square and the Airport 
on Friday" [29]. Spatial query extensions in GIS applications 
are able to run this type of query. However, these techniques 
are used to find the best solution by exploring each spatial 
object at a specific time according to some metric distance 
measurement (usually Euclidean). As results, it is difficult 
to capture collective behaviour and correlations among the 
involved entities using this type of queries. 

Recently, there has been growing interest in the querying 
of patterns which capture 'group 7 or 'common 7 behaviour 
among moving entities. This is particularly true for the 
development of approaches to identify groups of moving ob- 
jects for which a strong relationship and interaction exist 
within a defined spatial region during a given time dura- 
tion. Some examples of these patterns are flocks [1, 2, 14, 
15], moving clusters [4, 18, 7], convoy queries [3, 16], stars 
and k-stars [17], closed swarms [6, 13], group patterns [21], 
periodic patterns [25] , co-location patterns [22] , TraLus [23] , 
etc... 

To extract these kinds of patterns, different algorithms 
have been proposed. Naturally, the computation is costly 
and time consuming because we need to execute different 
algorithms consecutively. However, if we had an algorithm 
which could extract different kinds of patterns, the compu- 
tation costs will be significantly decreased and the process 
would be much less time consuming. Therefore, we need to 
develop an efficient unifying algorithm. 

In some applications (e.g. cars), object locations are 
continuously reported by using Global Positioning System 
(GPS). Therefore, new data is always available. If we do not 
have an incremental algorithm, we need to execute again and 
again algorithms on the whole database including existing 
data and new data to extract patterns. This is of course, 
cost-prohibitive and time consuming. An incremental al- 
gorithm can indeed improve the process by combining the 
results extracted from the existing data and the new data 
to obtain the final results. 

With the above issues in mind, we propose GeT_Move: 
a unifying incremental spatio-temporal pattern-mining ap- 
proach. Part of this approach is based on an existing state- 
of-the-art algorithm which is extended to take advantage of 
well-known frequent closed itemset mining algorithms. In 
order to use it, we first redefine spatio-temporal patterns in 
an itemset context. Secondly, we propose a spatio-temporal 
matrix to describe original data and then an incremental fre- 



quent closed itemset-based spatio-temporal pattern-mining 
algorithm to extract patterns. 

Naturally, obtaining the optimal parameters is a difficult 
task for most of algorithms which require parameters set- 
ting. Even if we are able to obtain the optimal parameters 
after doing many executions and evaluate the results on a 
dataset, the optimal values of parameters will be different 
on the other datasets. To tackle this issue and to optimize 
the efficiency as well as to free the parameters setting, we 
propose a parameter free incremental GeT_Move. The main 
idea is to re-arrange the input data based on nested con- 
cept [31] so that incremental GeT_Move can automatically 
extract patterns without parameters setting efficiently. 

The main contributions of this paper are summarized be- 
low. 

• We re-define the spatio-temporal patterns mining in the 
itemset context which enable us to effectively extract dif- 
ferent kinds of spatio-temporal patterns. 

• We present incremental approaches, named GeT_Move 
and Incremental GeT_Move, which efficiently extract fre- 
quent closed itemsets from which spatio-temporal pat- 
terns are retrieved. 

• We design and propose a parameter free incremental 
GeT_Move. The advantages of this approach is that it 
does not require the use to set parameters and automat- 
ically extract patterns efficiently. 

• We propose to deal with new data arriving and pro- 
pose an explicit combination of pairs of frequent closed 
itemsets-based pattern mining algorithm which effi- 
ciently combines the results in the existing database with 
the arriving data to obtain the final results. 

• We present comprehensive experimental results over 
both real and synthetic databases. The results demon- 
strate that our techniques enable us to effectively extract 
different kinds of patterns. Furthermore, our approaches 
are more efficient compared to other algorithms in most 
of cases. 

The remaining sections of the paper are organized as 
follows. Section 2 discusses preliminary definitions of the 
spatio-temporal patterns as well as the related work. The 
patterns such as swarms, closed swarms, convoys and group 
patterns are redefined in an itemset context in Section 3. We 
present our approaches in Section 4. Experiments testing ef- 
fectiveness and efficiency are shown in Section 5. Finally, we 
draw our conclusions in Section 6. 

2. SPATIO-TEMPORAL PATTERNS 

In this section we briefly propose an overview of the main 
spatio-temporal patterns. We thus define the different kinds 
of patterns and then we discuss the related work. 

2.1 Preliminary Definitions 

The problem of spatio-temporal patterns has been exten- 
sively addressed over the last years. Basically, a spatio- 
temporal patterns are designed to group similar trajectories 
or objects which tend to move together during a time in- 
terval. So many different definitions can be proposed and 
today lots of patterns have been defined such as flocks [1, 
2, 14, 15], convoys [3, 16], swarms, closed swarms [6, 13], 
moving clusters [4, 18, 7] and even periodic patterns [25]. 

In this paper, we focus on proposing a unifying approach 
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Figure 1: An example of swarm and convoy where 
ci,C2,C3,C4 are clusters gathering closed objects to- 
gether at specific timestamps. 

to effectively and efficiently extract all these different kinds 
of patterns. First of all, we assume that we have a group of 
moving object Odb — {oi, 02, . . . , o z }, a set of timestamps 
Tdb = {ti, t 2 , . . . , t n } and at each timestamp U £ Tdb, 
spatial information 1 x,y for each object. For example, Ta- 
ble 1 illustrates an example of a spatio-temporal database. 
Usually, in spatio-temporal mining, we are interested in ex- 
tracting a group of objects staying together during a period. 
Therefore, from now, O — {oi 1 ,Oi 2 , ... ,Oi p }(0 C Odb) 
stands for a group of objects, T = {t ai ,ta 2 , • • • ,ta m }(T C 
Tdb) is the set of timestamps within which objects stay 
together. Let e be the user-defined threshold standing for 
minimum number of objects and mint the minimum number 
of timestamps. Thus \0\ (resp. \T\) must be greater than 
or equal to e (resp. mint). 

In the following, we formally define all the different kinds 
of patterns. 

Informally, a swarm is a group of moving objects O con- 
taining at least e individuals which are closed each other for 
at least mint timestamps. Then a swarm can be formally 
defined as follows: 

Definition 1. Swarm [6]. A pair (0,T) is a swarm if: 

f (1) : \/t ai G T, 3c s.t. O C c, c is a cluster. 

There is at least one cluster containing 

all the objects in O at each timestamp in T . 
< (2) : |0| > " (1) 

There must be at least e objects. 

(3) : \T\>min t . 
< There must be at least mint timestamps. 

For example, as shown in Figure la, if we set e = 
2 and mint — 2, we can find the following swarms 

1 Spatial information can be for instance GPS location. 



({oi,o 2 }, {ti, t 3 }), ({oi,o 2 }, {ti, t 4 }), ({oi,o 2 }, {t 3 ,U}), 
({01, 02}, {£1, £3, £4}). We can no ^ e that these swarms are 
in fact redundant since they can be grouped together in the 
following swarm ({01, 02}, {£1, £3, £4}). 

To avoid this redundancy, Zhenhui Li et al. [6] propose the 
notion of closed swarm for grouping together both objects 
and time. A swarm (O, T) is object-closed if, when fixing 
T, O cannot be enlarged. Similarly, a swarm (O, T) is time- 
closed if, when fixing O, T cannot be enlarged. Finally, a 
swarm (0,T) is a closed swarm if it is both object-closed 
and time-closed and can be defined as follows: 



Definition 2. Closed Swarm [6]. 
closed swarm if: 



A pair (O, T) is a 



(1) : (0,T) is a swarm. 

(2) : $0' s.t (0',T) is a swarm and O C O' . 

(3) : $T' s.t. (0,T ; ) is a swarm and T C T' . 



(2) 




Figure 2: A group pattern example. 
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Figure 3: A periodic pattern example. 



For instance, in the previous example, 
({01, o 2 }, {£1, £3, £4}) is a closed swarm. 

A convoy is also a group of objects such that these ob- 
jects are closed each other during at least min t time points. 
The main difference between convoy and swarm (or closed 
swarm) is that convoy lifetimes must be consecutive. In es- 
sential, by adding the consecutiveness condition to swarms, 
we can define convoy as follows: 

Definition 3. Convoy [3]. A pair (0,T), is a convoy if: 

{(1) : (0,T) is a swarm. , , 

(2) : Vz, 1 < i < \T\, t ai ,ta i+1 are consecutive. ^ ' 

For instance, on Figure lb, with e — 2,min t = 
2 we have two convoys ({01, 02}, {£1, £ 2 , £3, £4}) and 
({01,02,03}, {£3, £4}). 

Until now, we have considered that we have a group of ob- 
jects that move close to each other for a long time interval. 
For instance, as shown in [28], moving clusters and different 
kinds of flocks virtually share essentially the same definition. 
Basically, the main difference is based on the clustering tech- 
niques used. Flocks, for instance, usually consider a rigid 
definition of the radius while moving clusters and convoys 
apply a density-based clustering algorithm (e.g. DBScan 
[5] ) . Moving clusters can be seen as special cases of convoys 
with the additional condition that they need to share some 
objects between two consecutive timestamps [28]. Therefore, 
in the following, for brevity and clarity sake we will mainly 
focus on convoy and density-based clustering algorithm. 

According to the previous definitions, the main difference 
between convoys and swarms is about the consecutiveness 
and non-consecutiveness of clusters during a time interval. 
In [21], Hwang et al. propose a general pattern, called a 
group pattern, which essentially is a combination of both 
convoys and closed swarms. Basically, group pattern is a set 
of disjointed convoys which are generated by the same group 
of objects in different time intervals. By considering a con- 
voy as a timepoint, a group pattern can be seen as a swarm 
of disjointed convoys. Additionally, group pattern cannot be 
enlarged in terms of objects and number of convoys. There- 
fore, group pattern is essentially a closed swarm of disjointed 
convoys. Formally, group pattern can be defined as follows: 



Definition 4. Group Pattern [21]. Given a set of objects 
O, a minimum weight threshold set of disjointed 

convoys Ts = {si, S2, • • • , s n }, a minimum number of con- 
voys min c . (0,Ts) is a group pattern if: 

( (1) : (0,Ts) is a closed swarm with e,min c . 

I (2) : > min wei . 

Note that min c is only applied for Ts (e.g. \Ts\ > min c ). 

For instance, see Figure 2, with mint — 2 
and e = 2 we have a set of convoys Ts = 
{({oi,o 2 }, {£i,£2}), ({oi,o 2 }, {£4, £5})}. Additionally, with 
min c — 1 we have ({01,02}, Ts) is a closed swarm of con- 
voys because \Ts\ = 2 > mm c , \0\ > £ and (0,Ts) cannot 
be enlarged. Furthermore, with min we i — 0.5, (0,Ts) is a 
group pattern since ^ t;L,t ^^"^ 4,t5 ^ = | > min we %- 

Previously, we overviewed patterns in which group ob- 
jects move together during some time intervals. However, 
mining patterns from individual object movement is also in- 
teresting. In [25], N. Mamoulis et al. propose the notion of 
periodic patterns in which an object follows the same routes 
(approximately) over regular time intervals. For example, 
people wake up at the same time and generally follow the 
same route to their work everyday. Informally, given an 
object's trajectory including M timepoints, Tv which is the 
number of timestamps that a pattern may re-appear. An ob- 
ject's trajectory is decomposed into L^J sub- trajectories. 
Tv is data-dependent and has no definite value. For exam- 
ple, Tv can be set to 'a day' in traffic control applications 
since many vehicles have daily patterns, while annual animal 
migration patterns can be discovered by Tv — 'a year'. For 
instance, see Figure 3, an object's trajectory is decomposed 
into daily sub-trajectories. 

Essentially, a periodic pattern is a closed swarm discov- 
ered from L^J sub- trajectories. For instance, in Figure 
3, we have 3 daily sub-trajectories and from them we ex- 
tract the two following periodic patterns {01,02,03,04} and 
{01,03,04}. The main difference in periodic pattern mining 
is the preprocessing data step while the definition is similar 
to that of a closed swarm. As we have provided the defini- 
tion of a closed swarm, we will mainly focus on closed swarm 
mining below. 




Table 2: Cluster Matrix 



Figure 4: An illustrative example. 



2.2 Related Work 

As mentioned before, many approaches have been pro- 
posed to extract patterns. The interested reader may refer 
to [20, 28] where short descriptions of the most efficient or 
interesting patterns and approaches are proposed. For in- 
stance, Gudmundsson and van Kreveld [1], Vieira et al. [2] 
define a flock pattern, in which the same set of objects stay 
together in a circular region with a predefined radius, Kalnis 
et al. [4] propose the notion of moving clusters, while Jeung 
et al. [3] define a convoy pattern. 

Jeung et al. [3] adopt the DBScan algorithm [5] to find 
candidate convoy patterns. The authors propose three algo- 
rithms that incorporate trajectory simplification techniques 
in the first step. The distance measurements are performed 
on trajectory segments of as opposed to point based distance 
measurements. Another problem is related to the trajectory 
representation. Some trajectories may have missing times- 
tamps or are measured at different time intervals. Therefore, 
the density measurements cannot be applied between trajec- 
tories with different timestamps. To address the problem of 
missing timestamps, the authors proposed to interpolate the 
trajectories by creating virtual time points and by applying 
density measurements on trajectory segments. Additionally, 
the convoy is defined as a candidate when it has at least k 
clusters during k consecutive timestamps. 

Recently, Zhenhui Li et al. [6] propose the concept of 
swarm and closed swarm and the ObjectGrowth algorithm to 
extract closed swarm patterns. The ObjectGrowth method 
is a depth-first-search framework based on the objectset 
search space (i.e., the collection of all subsets of Odb)- For 
the search space of Odb, they perform depth-first search of 
all subsets of Odb through a pre-order tree traversal. Even 
though, the search space remains still huge for enumerat- 
ing the objectsets in O (2' 0jDB '). To speed up the search 
process, they propose two pruning rules. The first pruning 
rule, called Apriori Pruning, is used to stop traversal the 
subtree when we find further traversal that cannot satisfy 
mint. The second pruning rule, called Backward Pruning, 
makes use of the closure property. It checks whether there 
is a superset of the current objectset, which has the same 
maximal corresponding timeset as that of the current one. 
If so, the traversal of the subtree under the current object- 
set is meaningless. After pruning the invalid candidates, the 
remaining ones may or may not be closed swarms. Then a 
Forward Closure Checking is used to determine whether a 
pattern is a closed swarm or not. 

In [21], Hwang et al. propose two algorithms to mine 
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group patterns, known as the Apriori-like Group Pattern 
mining algorithm and Valid Group- Growth algorithm. The 
former explores the Apriori property of valid group patterns 
and extends the Apriori algorithm [11] to mine valid group 
patterns. The latter is based on idea similar to the FP- 
growth algorithm [27]. 

Recently in [29], A. Calmeron proposes a frequent itemset- 
based approach for flock identification purposes. 

Even if these approaches are very efficient they suffer the 
problem that they only extract a specific kind of pattern. 
When considering a dataset, it is quite difficult, for the de- 
cision maker, to know in advance the kind of pattern em- 
bedded in the data. Therefore proposing an approach able 
to automatically extract all these different kinds of patterns 
can be very useful and this is the problem we address in this 
paper and that will be developed in the next sections. 

3. SPATIO-TEMPORAL PATTERNS IN 
ITEMSET CONTEXT 

Extracting different kinds of patterns requires the use of 
several algorithms and to deal with this problem, we propose 
an unifying approach to extract and manage different kinds 
of patterns. 

Basically, patterns are evolution of clusters over time. 
Therefore, to manage the evolution of clusters, we need to 
analyse the correlations between them. Furthermore, if clus- 
ters share some characteristics (e.g. share some objects), 
they could be a pattern. Consequently, if a cluster is con- 
sidered as an item we will have a set of items (called item- 
set). The main problem essentially is to efficiently combine 
items (clusters) to find itemsets (a set of clusters) which 
share some characteristics or satisfy some properties to be 
considered a pattern. To describe cluster evolution, spatio- 
temporal data is presented as a cluster matrix from which 
patterns can be extracted. 

Definition 5. Cluster Matrix. Assume that we have 
a set of clusters Cdb — {Ci, C2, . . . , C n } where C% — 
{czit i5 Ci2*i> • • • 5 CimtJ is a se t of clusters at timestamps U. 
A cluster matrix is thus a matrix of size \Odb\ x \Cdb\. 
Each row represents an object and each column represents 
a cluster. The value of the cluster matrix cell, (oi,Cj) is 1 
(resp. empty) if Oi is in (resp. is not in) cluster Cj . A clus- 
ter (or item) Cj is a cluster formed after applying clustering 
techniques. 

For instance, the data from illustrative example (Figure 4) 
is presented in a cluster matrix in Table 2. Object o\ belongs 
to the cluster cn at timestamp t±. For clarity reasons in the 
following, dj represents the cluster a at time tj. Therefore, 
the matrix cell (01-cn) is 1, meanwhile the matrix cell (04- 
C11) is empty because object 04 does not belong to cluster 
cn. 



Figure 5: A swarm from our example. 



Figure 6: A convoy from our example. 



By presenting data in a cluster matrix, each object 
acts as a transaction while each cluster Cj stands for an 
item. Additionally, an itemset can be formed as T = 

i C t ai 5 °ta 2 5 • • • 5 C t ap } witn time Tr = {t ai , ta 2 , • • • , ta p } 

where t ai < t a2 < ••• < ta p , Va; : t ai G T D B,c ta . G C ai . 
The support of the itemset T, denoted cr(T), is the num- 
ber of common objects in every items belonging to T, 
O(Y) = Pir=i Ct a • Additionally, the length of T, denoted 
|T|, is the number of items or timestamps (= |Tt|). 

For instance, in Table 2, for a support value of 2 we have: 
T = {cn,ci2} veryfying cr(T) = 2. Every items (resp. clus- 
ters) of T, en and C12, are in the transactions (resp. objects) 
oi, 02. The length of |T| is the number of items (= 2). 

Naturally, the number of clusters can be large; however, 
the maximum length of itemsets is \Tdb\- Because of the 
density-based clustering algorithm used, clusters at the same 
timestamp cannot be in the same itemsets. 

Now, we will define some useful properties to extract the 
patterns presented in Section 2 from frequent itemsets as 
follows: 

Property 1. Swarm. Given a frequent itemset T = 
{ct ai ,c ta2 ,...,c t }. (0(T),T T ) is a swarm if, and only 
if: 



(1) :a(T)>s 

(2) : |T| > mint 



(5) 



Proof. After construction, we have <j(T) > e and 
cr(T) = |0(T)| then |0(T)| > s. Additionally, as |T| > 
mint and |T| = \Ty\ then \Ty\ > mint. Furthermore, 
Vt a . G Xr,0(Y) C Ct , means that at every timestamp 
we have a cluster containing all objects in O(Y). Conse- 
quently, (0(Y),Xr) is a swarm because it satisfies all the 
requirements of the Definition 1. □ 

For instance, in Figure 5, for the frequent itemset T = 
{cn,ci 3 } we have (O(T) = {oi, 02, 03}, T T = {ti,t 3 }) which 
is a swarm with support threshold e = 2 and mint = 2. We 
can notice that cr(Y) = 3 > £ and |T| = 2 > mint. 

Essentially, a closed swarm is a swarm which satisfies the 
object-closed and time-closed conditions therefore closed- 
swarm property is as follows: 

Property 2. Closed Swarm. Given a frequent itemset 
T = {ct ai , c ta2 , . . . , c tap }. (O(T), Tr) is a closed swarm if 
and only if: 

(1) : (0(T),Tt) is a swarm. 

(2) : JT' s.£ O(T) C 0(T'),T T / = T T and 
(0(T x ),Tt) is a swarm. (6) 

(3) : s.t. 0(T X ) = 0(T),T T C T T / and 
(0(Y),7V) is a swarm. 



Proof. After construction, we obtain (0(Y),Xr) which 
is a swarm. Additionally, if s.t O(T) C 0(T X ), T T / = T T 
and (0(T / ),Tt) is a swarm then (0(Y),Tr) cannot be 
enlarged in terms of objects. Therefore, it satisfies the 
object-closed condition. Furthermore, if $T' s.t. 0(T X ) = 
0(T),T T C T T / and (0(T),T T /) is a swarm then (0(T),T T ) 
cannot be enlarged in terms of lifetime. Therefore, it satis- 
fies the time-closed condition. Consequently, (0(Y),Tr) is 
a swarm and it satisfies object-closed and time-closed condi- 
tions and therefore (0(Y),Tr) is a closed swarm according 
to the Definition 6. □ 

According to the convoy Definition 3, a convoy is a swarm 
which satisfies the consecutiveness condition. Therefore, for 
an itemset, we can extract a convoy if the following property 
holds: 

Property 3. Convoy. Given a frequent itemset T = 
{c ta ,c ta2 , . . . ,c ta }. (0(Y),Tr) is a convoy if and only 
if: 



(1):(0(T),T T ) 



is a swarm. 



(7) 



(2) : V7, 1 < j < p : t aj , t aj+1 are consecutive. 

Proof. After construction, we obtain (0(Y),Xr) which 
is a swarm. Additionally, if T satisfies the condition (2), it 
means that the T's lifetime is consecutive. Consequently, 
(O(T), Tr) is a convoy according to the Definition 3. □ 

For instance, see Table 2 and Figure 6, for the frequent 
itemset T = {en, C12, C13} we have (O(T) = {01,02}, Tr — 
{^1,^2, £3}) is a convoy with support threshold e = 2 and 
mint = 2. Note that 03 is not in the convoy. 

Please remember that group pattern is a set of disjointed 
convoys which share the same objects, but in different time 
intervals. Therefore, the group pattern property is as fol- 
lows: 

Property 4. Group Pattern. Given a frequent itemset 
Y = {ct ai , ct a<2 , . . . , ct ap }, a mininum weight min we %, a min- 
imum number of convoys min c , a set of consecutive time 
segments Ts — {si, S2, . . . , s n }- (0(Y),Xs) is a group pat- 
tern if and only if: 



(i) 

(2) 
(3) 
(4) 
(5) 



\Ts\ > min c . 

Vsi, Si C Tr, \si\ > min t . 

nr=i* = w=iO(*) = o(t). 

Vs £ T s , s is a convoy, O(T) g O(s) 
^ mtnyjei- 



(8) 



PROOF. If |Ts| > mm c then we know that at least min c 
consecutive time intervals Si in Ts. Furthermore, if Vs*, Si C 
Tt then we have O(T) C O(si). Additionally, if \si\ > mmt 
then (O(T), s») is a convoy (Definition 3). Now, T5 actually 
is a set of convoys of O(T) and if HILi s i — then Ts is a 



Table 3: Periodic Cluster Matrix 
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set of disjointed convoys. A little bit further, if Vs Ts, s 
is a convoy and O(T) £ O(s) then JT 5 / s.t. T s C T 5 / 

and n!=f l O(si) = O(T). Therefore, (0(T),T 5 ) cannot 
be enlarged in terms of number of convoys. Similarly, if 
(X =1 0(si) = O(T) then (0(T),T 5 ) cannot be enlarged in 
terms of objects. Consequently, (O(T), Ts) is a closed swarm 
of disjointed convoys because |0(T)| > e, |Ts| > mm c and 
(0(T),Ts) cannot be enlarged (Definition 6). Finally, if 
(0(T),Ts) satisfies condition (5) then it is a valid group 
pattern due to Definition 4- □ 

As mentioned before, the main difference in periodic pat- 
tern mining is the input data while the definition is similar 
to that of closed swarm. The cluster matrix which is used 
for periodic mining can be defined as follows: 

Definition 6. Periodic Cluster Matrix (PCM). Periodic 
cluster matrix is a cluster matrix with some differences as 
follows: 1) Object o is a sub -trajectory st, 2) STdb is a set 
of all sub-trajectories in dataset. 

For instance, see Table 3, an object's trajectory is de- 
composed into 3 sub-trajectories and from them a periodic 
cluster matrix can be generated by applying clustering tech- 
nique. Assume that we can extract a frequent itemsets 
T = {ct ai , c ta<2 , . . . , ct a } from periodic cluster matrix, the 
periodic can be defined as follows: 

Property 5. Periodic Pattern. Given a frequent items et 
^ = i c t ai > c *a 2 5 • • • 5 c t ap }> a mininum weight which 
is extracted from periodic cluster matrix. (ST(T), (T)r) is 
a periodic pattern if and only if (ST(T), (T)r) is a closed 
swarm. Note that ST(T) = f]^ =1 ct a . 

Above, we presented some useful properties to extract 
spatio-temporal patterns from itemsets. Now we will fo- 
cus on the fact that from an itemset mining algorithm we 
are able to extract the set of all spatio-temporal patterns. 
We thus start the proof process by analyzing the swarm ex- 
tracting problem. This first lemma shows that from a set 
of frequent itemsets we are able to extract all the swarms 
embedded in the database. 

Lemma 1. Let FI = {Yi, Y 2 , . . . , Y^} be the frequent 
itemsets being mined from the cluster matrix with minsup = 
e. All swarms (0,T) can be extracted from FI. 

Proof. Let us assume that (0,T) is a swarm. Note, 
T = {t ai , ta 2 5 • • • 5 ta m }• According to the Definition 1 we 
know that \0\ > s. If (O, T) is a swarm then \/t ai £ T, 3c ta . 
s.t. O C a a . therefore HI^i °t a - — O- Additionally, we 
know that Vct a ., c ta . is an item so 3Y = U™ =1 c ta . is an 
itemset and 0(T) = f]?=i c t ai = O, T T = \JT=i t«\ = T - 
Therefore, (0(Y),Tr) is a swarm. So, (0,T) is extracted 
from T. Furthermore, cr(Y) = |0(T)| = \0\ > e then T 
is a frequent itemset and T £ FL Finally, V(0,T) s.t. if 



(0,T) is a swarm then 3Y s.t. T £ FI and (0,T) can be 
extracted from T, we can conlude V swarm (O, T), it can be 
mined from TT □ 

We can consider that by adding constraints such as 
"consecutive lifetime", "time-closed", "object-closed", "in- 
tegrity proportion" to swarms, we can retrieve con- 
voys, closed swarms and moving clusters. Therefore, if 
Swarm, C Swarm, Convoy, M Cluster respectively contain 
all swarms, closed-swarms, convoys and moving clusters then 
we have: C Swarm C Swarm, Convoy C Swarm and 
M Cluster C Swarm. By applying Lemma 1, we retrieve 
all swarms from frequent itemsets. Since, a set of closed 
swarms, a set of convoys and a set of moving clusters are 
subsets of swarms and they can therefore be completely ex- 
tracted from frequent itemsets. Additionally, all periodic 
patterns also can be extracted because they are similar to 
closed swarms. Now, we will consider group patterns and 
we show that all of them can be directly extracted from the 
set of all frequent itemsets. 

Lemma 2. Given FI — {Yi, Y2, . . . , Tj} contains all fre- 
quent itemsets mined from cluster matrix with minsup — e. 
All group patterns (0,Ts) can be extracted from FI. 

Proof. V(0, Ts) is a valid group pattern, we have 3Zs = 
{si, S2, . . . , s n } and Ts is a set of disjointed convoys of O. 
Therefore, (0,T S .) is a convoy and Vs^ £ Ts, Vt £ T Si ,3ct 
s.t. O C a . Let us assume C Si is a set of clusters correspond- 
ing to Si, we know that 3Y, Y is an itemset, Y = UILi C Si 
and O(Y) = n7=iO(C 8i ) = O. Additionally, (0,T S ) is 
a valid group pattern; therefore, \0\ > e so |0(Y)| > e. 
Consequently, Y is a frequent itemset and Y £ FI because 
Y is an itemset and <j(Y) = |0(Y)| > e. Consequently, 
V(0, Ts), 3Y £ FI s.t. (O, T s ) can be extracted from Y and 
therefore all group patterns can be extracted from FI. 
□ 

As we have shown that patterns such as swarms, closed 
swarms, convoys, group patterns can be similarly mapped 
into frequent itemset context. However, mining all frequent 
itemsets is cost prohibitive in some cases. Moreover, the 
set of frequent closed itemsets has been proved to be a con- 
densed collection of frequent itemsets, i.e., both a concise 
and lossless represention of a collection of frequent itemsets 
[8, 9, 10, 24, 26, 30]. They are concise since a collection 
of frequent closed itemsets is orders of magnitude smaller 
than the corresponding collection of frequents. This allows 
the use of very low minimum support thresholds. Moreover, 
they are lossless, because it is possible to derive the identity 
and the support of every frequent itemsets in the collection 
from them. Therefore, we only need to extract frequent 
closed itemsets and then to scan them with properties to 
obtain the corresponding spatio-temporal patterns instead 
of having to mine all frequent itemsets. 

4. FREQUENT CLOSED ITEMSET-BASED 
SPATIO-TEMPORAL PATTERN MIN- 
ING ALGORITHM 

Recently, patterns have been redefined in the itemset 
context. In this section, we propose two approaches i.e., 
GeT-Move and Incremental GeT_Move, to efficiently ex- 
tract patterns. The global process is illustrated in Figure 
7. 
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Figure 7: The main process. 



In the first step, a clustering approach (Figure 7-(l)) is 
applied at each tiniest amp to group objects into different 
clusters. For each timestamp £ a , we have a set of clusters 
C a = {cit a ,c 2 t a , • • • ,c m t a }, with 1 < k < m,c k t a C Odb- 
Spatio-temporal data can thus be converted to a cluster ma- 
trix CM (Table 2). 

4.1 GeTJMove 

After generating the cluster matrix CM, a frequent closed 
itemset mining algorithm is applied on CM to extract all the 
frequent closed itemsets. By scanning frequent closed item- 
sets and checking properties, we can obtain the patterns. 

In this paper, we apply the LCM algorithm [26, 30] to 
extract frequent closed itemsets as it is known to be a very 
efficient algorithm. The key feature of the LCM algorithm 
is that after discovering a frequent closed itemset X, it gen- 
erates a new generator X [i] by extending X with a frequent 
item z, i X. Using a total order relation on frequent items, 
LCM verifies if X[i] violates this order by performing tests 
using only the tidset 2 of X, called T(X), and those of the fre- 
quent items i. If X[i] is not discarded, then X[i] is an order 
preserving generator of a new frequent closed itemset. Then, 
its closure is computed using the previously mentioned tid- 
set s. 

In this process, we discard some useless candidate item- 
sets. In spatio-temporal patterns, items (resp. clusters) 
must belong to different timestamps and therefore items 
(resp. clusters) which form a FCI must be in different times- 
tamps. In contrast, we are not able to extract patterns 
by combining items in the same timestamp. Consequently, 
FCIs which include more than 1 item in the same timestamp 
will be discarded. 

Thanks to the above characteristic, we now have the max- 
imum length of the frequent closed itemsets which is the 
number of timestamps |Tdb|. Additionally, the LCM search 
space only depends on the number of objects (transactions) 
| Odb | and the maximum length of itemsets \Tdb\- Con- 
sequently, by using LCM and by applying the property, 
GeT_Move is not affected by the number of clusters and 
therefore the computing time can be greatly reduced. 

The pseudo code of GeT_Move is described in Algorithm 



2 Called tidlists in [24, 10] and denotations in [26, 30]. 



1. The core of GeT_Move algorithm is based on the LCM 
algorithm which has been slightly modified by adding the 
pruning rule and by extracting patterns from FCIs. The 
initial value of FCI X is empty and then we start by putting 
item i into X (lines 2-3). By adding i into X, we have X[i] 
and if X[i] is a FCI then X[i] is used as a generator of 
a new FCI, call LCM_tter(X, T(X), i(X)) (lines 4-5). In 
LCM_Iter, we first check properties of Section 3 (line 8) for 
FCI X. Next, for each transaction £ G T(X), we add all 
items j, which are larger than i(X) and satisfy the pruning 
rule, into occurrence sets J\j\ (lines 9-11). Next, for each 
j G *J[j], we check to see if J[j] is a FCI, and if so, then we 
recall LCM_Iter with the new generator (lines 12-14). After 
terminating the call for J[j], the memory for J[j] is released 
for the future use in J[k] for k < j (lines 15). 

Regarding to the PatternMining sub-function (lines 16- 
37) , the algorithm basically checks properties of the itemset 
X to extract spatio-temporal patterns. If X satisfies the 
mint condition then X is a closed swarm (lines 18-19). Af- 
ter that, we check the consecutive time constraint for convoy 
and moving cluster (lines 21-22) and then if the convoy sat- 
isfies mint condition and correctness in terms of objects con- 
taining (line 31), output convoy (line 32). Next, we put con- 
voy into group pattern gPattern (line 33) and then output 
group pattern if it satisfies the min c condition and min we i 
condition at the end of scanning X (line 37). Regarding to 
moving cluster mc, we check the integrity at each pair of 
consecutive timestamps (line 24). If mc satisfies the condi- 
tion then the previous item Xk will be merged into mc (line 
25). If not, we check the mint condition for mc U Xk and if 
it is satisfied then we output mc U Xk as a moving cluster. 

4.2 Incremental GeT_Move 

Naturally, in real world applications (cars, animal migra- 
tion), the objects tend to move together in short interval 
meanwhile their movements can be different in long interval. 
Therefore, the number of items (clusters) can be large and 
the length of FCIs can be long. For instance, let us consider 
the Figure 8a, objects {01,02,03,04} move together during 
first 100 timestamps and after that 01, 02 stay together while 
03, 04 move together in another direction. The problem here 
is that if we apply GeT_Move on the whole dataset, the 
extraction of the itemsets can be very time consuming. 

To deal with this issue, we propose the Incremental 
GeT_Move algorithm. The main idea is to split the trajec- 
tories (resp. cluster matrix CM) into short intervals, called 
blocks. By applying frequent closed itemset mining on each 
short interval, the data can then be compressed into local 
frequent closed itemsets. Additionally, the length of item- 
sets and the number of items can be greatly reduced. 

For instance, see Figure 8, if we consider [£i,£ioo] as a 
block and [£101, £200] as another block, the maximum length 
of itemsets in both blocks is 100 (insteads of 200). Addition- 
ally, the original data can be greatly compressed (e.g. Figure 
8b) and only 3 items remain: czn, 0212, &22- Consequently, 
the process is much improved. 

Definition 7. Block. Given a set of timestamps Tdb — 
{£1, £2, . . • , t n }, a cluster matrix CM. CM is vertically split 
into equivalent (in terms of intervals) smaller cluster ma- 
trices and each of them is a block b. Assume Tb is a set 
of timestamps of block b, Tb = {£1, £2, • • • , tk}, thus we have 
\T b \ = k<\T DB \. 



Algorithm 1: GeT_Move 



Input 
1 begin 

2 
3 
4 
5 



10 
11 
12 
13 
14 
15 



: Occurrence sets J , int e, int mint, set of 
items Cdb, double int min c , double min u 



X := 7(T(0)); //The root 
for i :— 1 to \Cdb\ do 

if |T(X[z])| > e and |X[z]| is closed then 
| LCMJter (X [z] , T (X [i] ) , i) ; 
e LCMJter(X,T(X),z(X)) 
7 begin 

PatternMining(X, mint)] /*X is a pattern?*/ 
foreach transaction t G T(X) do 

foreach j G £, j > i(X), j.time time(X) do 
| insert j to J [j]; 
foreach j G c7[j] m t/ie decreasing order do 
if |T(i7[j])| > e and J[j] is closed then 

| LCM_Iter(^[j],T(^[j]),j); 
Delete J[j}; 

16 PatternMining(X, mint) 

17 begin 

is if \X\ > mint then 

19 output X; /*Closed Swarm*/ 

20 gPattern := 0; convoy := 0; mc := 0; 

21 for k := 1 to \X - 1| do 

22 if Xk-time = X(k+i)-time — 1 then 

23 convoy := convoy U x^\ 



24 
25 
26 
27 
28 

29 
30 
31 



if 



\T(x k )nT(x k+1 )\ 



> then 



\T(x k )uT(x k+1 ) 
| mc := mc U Xk] 
else 

if \mcUxk\ > mint then 
output mc U Xfc; 

/*MovingCluster*/ 
mc := 0; 



else 



if \convoy U Xk\ > mint and 
\T(convoy U Xk)\ = \T(X)\ then 

32 output convoy U x^; /*Convoy*/ 

33 gPattern := gPattern{J(convoyUxk); 

34 if |mcUxfc| > mint then 

35 | output mc U Xk] /*MovingCluster*/ 

36 convoy :— 0; mc := 0; 

37 if \gPattern\ > min c and 

size(q Pattern) \ • < i 

^ — j '- > mm we i then 

| output gPattern; /*Group Pattern*/ 
39 Where: X is itemset, X[i] := X U i, i{X) is the last 
item of X, T(X) is list of tractions that X belongs to, 
J[j] := T(X[j]), j.time is time index of item j, 
time(X) is a set of time indexes of X, |T(com>oy)| is 
the number of transactions that the convoy belongs to, 
\gPattern\ and size(g Pattern) respectively are the 
number of convoys and the total length of the convoys 
in gPattern. 



Assume that we obtain a set of blocks B = 62, • • • , b p } 
with |T bl | = |T b2 | = ... = \T bp l\J p i=1 bi = CM and 
riiLi h — 0- Given a set of frequent closed itemset col- 
lections CI = {CIi, Ch, ■ ■ ■ , CI P } where CI% is mined from 
block bi. CI is presented as a closed itemset matrix which is 
formed by horizontally connecting all local frequent closed 
itemsets: CIM = \J p i=1 CIi. 




(b) Data after applying frequent closed itemsets mining 
on Blocks. 

Figure 8: A case study example. (b)-cin (resp. 
cii2,ci22) is a frequent closed itemset extracted from 
block 1 (resp. block 2). 



Table 4: Closed Itemset Matrix 



Block B 


bi 


b 2 


Frequent Closed Itemsets CI 


tin 


til2 


ci 2 2 




01 


1 




1 


O db 


02 


1 




1 


03 


1 


1 






04 


1 


1 





Definition 8. Closed Itemset Matrix (CIM). Closed 
itemset matrix is a cluster matrix with some differences as 
follows: 1) Timestamp t now becomes a block b. 2) Item c 
is a frequent closed itemset ci. 

For instance, see Table 4, we have two sets of frequent 
closed itemsets CIi = {cin},Cl2 = {cii2,ci22} which are 
respectively extracted from blocks bi , 62 • Closed itemset ma- 
trix CIM = CIi U CI2 means that CIM is created by hor- 
izontally connecting CIi and CIi. Consequently, we have 
CIM as in Table 4. 

We have already provided blocks to compress original 
data. Now, by applying frequent closed itemset mining on 
closed itemset matrix CIM, we are able to retrieve all fre- 
quent closed itemsets from corresponding data. Note that 
items (in CIM) which are in the same block cannot be in 
the same frequent closed itemsets. 

Lemma 3. Given a cluster matrix CM which is vertically 
split into a set of blocks B = {61, 62, . . . , b p } so that VT, T is 
a frequent closed itemset and T is extracted from CM then 
T can be extracted from closed itemset matrix CIM. 

PROOF. Let us assume that V&*,3/i is a set of items 
belonging to bi and therefore we have U — ^- If 

VT,T is a FCI extracted from CM then T is formed as 
T = {71,72, • • • ,7p} where 7; is a set of items s.t. 7* C U. 
Additionally, T is a FCI and O(T) = f]Li°M then 
VO(7i),0(T) C 0(<ji). Furthermore, we have |0(T)| > e; 
therefore, |0(7i)| > e so ji is a frequent itemset. Assume 



that 37;, 7< CT* then 3^, ^ £ CJ* s.t. 7; C ^ and a (7*) = 
cr(^),0(7i) = O(^). Note that ^, 7; are from 6*. Remem- 
ber that O(T) = 0(71)00(72)0. . .nO(7i)H. . .nO(7 P ) then 
we have: 3T' s.t. 0(V) = 0(71) n 0(72) H ... n O(^) n ... n 
0(7 P ). Therefore, O(T') = O(T) and a(T / ) = a(T). Ad- 
ditionally, we know that ji C ^ so T C Y'. Consequently, 
we obtain TCT' and <j(T) = cr(T'). Therefore, T is not a 
FCI. That violates the assumption and therefore we have: if 
37^,7; Cli therefore T is not a FCI. Finally, we can con- 
clude that VT, T = {71, 72, . . . , 7 P } is a FCI extracted from 
CM, £ T, ji must belong to Cli and ji is an item in 
closed itemset matrix CIM. Therefore, T can be retrieved 
by applying FCI mining on CIM. □ 



Algorithm 2: Incremental GeT_Move 



Input 



: Occurrence sets K, int e, int mint, double 6, 
set of Occurrence sets (blocks) B, int min c , 
double min W ei 

1 begin 

2 K := 0; CI := 0; int item-total := 0; 

3 foreach b <E B do 

4 I LCM(6,e,/ b ); 

5 GeT_Move(i^, e, mint, CI, 0, min c , min we i); 

6 LCM( Occurrence sets JT, int 00, set of items C) 

7 begin 

8 X := J(T(0)); //The root 

9 for i := 1 to \C\ do 

10 if |T(X[z])| > e and \X[i]\ is closed then 

11 I LCM_Iter(X[z],T(X[z]),z); 

12 LCM_Iter(X,r(X),z(X)) 

13 begin 

14 Update(if, X, T (X) , item-total + +) ; 

15 foreach transaction t £ T(X) do 

16 foreach j £ t, j > i(X), j.time time(X) do 

17 j insert j to ^[7]; 

is foreach j, ^[7] ^ <t> in the decreasing order do 

19 if IT^L?'])! > £ and closed then 

20 I LCM_Iter(J[j],T(J[i]),i); 

21 Delete ^[j]; 

22 Update(if, X,T(X), it em-total) 

23 begin 

24 
25 
26 



foreach t £ T(X) do 

I insert item-total into if [£]; 
CI := CI U item-total', 



By applying Lemma 3, we can obtain all the FCIs and 
from the itemsets, patterns can be extracted. Note that 
the Incremental GeT_Move does not depend on the length 
restriction mint. The reason is that mint is only used 
in Spatio-Temporal Patterns Mining step. Whatever mint 
{mint > block size or mint < block size), Incremental 
GeT_Move can extract all the FCIs and therefore the final 
results are the same. 

The pseudo code of Incremental GeT-Move is described 
in Algorithm 2. The main difference between the code 
of Incremental GeT_Move and GeT_Move is the Update 
sub-function. In this function, we, step by step, generate 
the closed itemsets matrix from blocks (line 14 and lines 
22-26). Next, we apply GeT-Move to extract patterns (line 
5). 



4.3 Toward A Parameter Free Incremental 
GeTJMove Algorithm 

Until now, we have presented the Incremental GeT_Move 
which split the original cluster matrix into different equiva- 
lent blocks. The experiment results show that the algorithm 
is efficient. However, the disadvantage of this approach is 
that we do not know what is the optimal block size. To 
identify the optimal block sizes, different techniques can be 
applied, such as data sampling in which a sample of data 
is used to investigate the optimal block sizes. Even if this 
approach is appealing, extracting such a sample is very dif- 
ficult. 

To tackle this problem, we propose an innovative solution 
to dynamically assign blocks to Incremental GeT_Move. Be- 
fore presenting the approach, we would like to propose the 
definition of a fully nested cluster matrix (resp. block) (Fig- 
ure 9c) as follows. 

Definition 9. Fully nested cluster matrix (resp. block). 
An n x m 0-1 block b is fully nested if for any two column 
n and r i+ \, n,n+i e b, we have n D r i+ \ = n+i. 

We can consider that the LCM is very efficient when ap- 
plied on dense (resp. (fully) nested) datasets and blocks. 
Let E be the universe of items, consisting of items 1, . . . , n. 
A subset X of E is called an itemset. In the LCM algorithm 
process on a common cluster matrix, for any X, we make 
the recursive call for X[i] for each i £ {i(X) + 1, . . . , \E\} 
because we do not know which X[i] will be a closed itemset 
when X is extended by adding i to X. Meanwhile, for a 
fully nested cluster matrix, we know that only the recursive 
call for item i = i(X) + 1 is valuable and the other recursive 
calls for each item i £ {i(X) + 2, . . . , \E\} are useless. Note 
that i(X) returns the last item of X. 

Property 6. Recursive Call. Given a fully nested clus- 
ter matrix nCM (resp. block), a universe of items E of nCM, 
an itemset X which is a subset of E. All the FCIs can be 
generated by making a recursive call of item i = i(X) + 1. 

Proof. After construction, we have Vz £ E, 0(i) n 0(i + 
1) = 0(i + 1); thus, 0(i + 1) C 0(i). Additionally, W £ 
{i(X) +2, . . . , \E\} we need to make a recursive call for X[i] 
and let assume that we obtain a frequent itemset X U 1 U 
X' with X' C {i(X) + 3, . . . , \E\}. We can consider that 

0(i) C 0(i(X) + l) and therefore O(XUi'UX') =o(lU 
(i(X) + l) U % U X'^j . Consequently, lUi'Ul' is not a 
FCI because (lUz'U X') c(lU (i(X) + l) U % U X'^j and 
0(X U i' U X') = O (x U (i(X) + 1) U i' U X') . Furthermore, 

(x U (i(X) + 1) U 1 U X'^j can be generated by making 

a recursive call for i(X) + 1. We can conclude that it is 
useless to make a recursive call for \/i' £ {i(X) + 2, . . . , \E\} 
and additionally, all FCIs can be generated only by making 
a recursive call for i(X) + 1. □ 

By applying Property 6, we can consider that LCM is 
more efficient on a fully nested matrix because it reduces 
unnecessary recursive calls. Therefore, our goal is to re- 
trieve fully nested blocks to improve the performance of In- 
cremental GeT_Move. In order to reach this goal, we first 
apply the nested and segment nested Greedy algorithm 3 [31] 

3 http://www. aics-research.com/nestedness/ 




Table 5: An example of FCI binary presentation. 



Figure 9: Examples of non-nested , almost nested, 
fully nested datasets. Black = 1, white = 0. (a) 
Original, (b) Almost nested, (c) Fully nested. 



to re-arrange the cluster matrix (Figure 9a) so that it now 
becomes a nested cluster matrix (Figure 9b). Then, we pro- 
pose a sub-function Nested Block Partition (Figure 7- (4)) to 
dynamically split the nested cluster matrix into fully nested 
blocks (Figure 9c). 

By following the definition 9 and scanning the nested clus- 
ter matrix from the beginning to the end, we are able to 
obtain all fully nested blocks. We start from the first col- 
umn of nested cluster matrix, then we check the next col- 
umn and if the nested condition is held then the block is 
expanded; otherwise, the block is set and we create a new 
block. Note that all small blocks containing only 1 column 
are merged into a sparse block SpareB. At the end, we ob- 
tain a set of fully nested blocks NestedB and a sparse block 
SpareB. Finally, the Incremental GeT_Move is applied on 
B = NestedB U SpareB. 

The pseudo code of Fully Nested Blocks Partition sub- 
function is described in Algorithm 3. 

Algorithm 3: Fully Nested Blocks Partition 



a nested cluster matrix CMn 
a set of blocks B 



Input 
Output: 

1 begin 

2 B :— 0; NestedB := 0; SpareB := 0; 

3 foreach item i £ CMn do 

4 if z n (z + 1) = (z + 1) then 

5 | NestedB := NestedB U z; 

6 else 

7 NestedB := NestedB U z; 

8 if \ NestedB] < 1 then 

9 SpareB .pushball (NestedB) ; 
10 NestedB := 

n else 

12 B := B U NestedB; 

13 NestedB := 

14 return B := B U SpareB; 

15 where the purpose SpareB.push_all(NestedB) function 
is to put all items in NestedB to SpareB. 



4.4 Spatio-Temporal Pattern Mining Algo- 
rithm Based on Explicit Combination of 
FCI Pairs 

In real world applications (e.g. cars), object locations are 
continuously reported by using Global Positioning System 
(GPS). Therefore, new data is always available. Let us de- 
note the new movement data as (Odb,Tdb')- Naturally, 
it is cost-prohibitive and time consuming to execute Incre- 
mental GeT_Move (or GeT_Move) on the entire database 
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(denoted (Odb^Tdb U T db >)) which is created by merging 
(Odb,Tdb') into the existing database (Odb<>Tdb)> To 
tackle this issue, we provide an approach which efficiently 
combines the existing frequent closed itemsets FCIs d b with 
the new frequent closed itemsets FCIs DB ', which are ex- 
tracted from DB' , to obtain the final results FCIs DB [jdb' • 

For instance, in Table 5, we have two sets of frequent 
closed itemsets FCIsdb and FCIs DB >. Each FCI will be 
presented as a | Odb |-bit binary numeral. For clarity sake, 
binary presentation of a FCI is used when applying binary 
operators (i.e. V, A, etc). For instance, b(di) V b(ci[) is 
represented by ci\ V ci[. On the other hand, they are con- 
sidered as a list of items (resp. clusters) when set operators 
(i.e. U, H, C, £, etc) are applied. 

The principle function of our algorithm is to explicitly 
combine all pairs of FCIs(ci, ci') to generate new FCIs. Let 
us assume that ci A ci' — 7, 7 = ci U ci' is a FCI if (7(7) is 
larger than e and that there are no subsets of O(cz), 0(ci') 
so that they are a superset of 0(7). Here is an explicit 
combination of a pair of FCIs(ci, ci'): 

Property 7. Explicit Combination of a pair of FCIs. 
Given FCIs ci and ci' so that ci £ FCIsdb, ci' £ FCIs DB > , 
a . ciUci' is a FCI that belongs to FCIsdbvjdb' if and only 
if: 



if ci A ci' — 7 then 

(1) : Size(j) > e. 

(2) :$p:pe FCIs DB , 0( 7 ) C 0(p) C O(cz). 

(3) : y • V e FCl8 DB ',0{rt) £ 0(p') C O(ci'). 



(9) 



where ci = {c tai , c ta2 , . . . , c tap } and ci' = 
{c' tai , c' ta ^ , . . . , c' ta }, Size(ci) returns the number of 
'V in ci. Note that Size(j) = 0(7) = (7(7). 

Proof. After construction, we have $p : p £ 
FCIs DB ,0(<y) C 0(p) C O(ci). We assume that 3i 
s.t. i £ Odd, 0(7) C i and i £ ci therefore 3p s.t. 
p = {\/i\i £ Cdb,0(j) C i,i £ ci}Uci,0(j) C 0(p). 
Consequently, Mi £ Cdb,0(j) C i then i £ p and there- 
fore p is a FCI and p £ FCIsdb- This violates the as- 
sumption and therefore $i s.t. i £ Cdb,0(j) C i and 
i £ ci or Vz s.t. i £ Cdb,0(^) C i then i £ ci. Simi- 
larly, if $p' : p' £ FO/sdd',0(7) C 0(p') C O(ci') then 
\/i' s.t. i' £ Odd 7 , 0(7) ^ then i' £ ci' . Consequently, 
if Vi £ Cdbudb' jO(j) C z then z £ U cz ; . Addition- 
ally, Size(~f) = (7(7) > e and therefore cz U cz ; is a FCI and 
cz U cz ; £ FC Is£> B yj£> B ' . 

We can consider that if cz U cz ; is a FCI, they must re- 
spectively be the two longest FCIs which contain 0(7) in 
FCIsdb and FCIs DB >. (0(7), cz U ci') is a new FCI and 
it will be stored in a set of new frequent closed itemsets, 
named FCIs new . To efficiently make all combinations, we 
first partition FCIsdb, FCIs DB ' and FCIs ne w into differ- 
ent partitions in terms of support so that the FCIs, that have 
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Figure 10: An example of the explicit combination 
of pairs of FCIs-based approach. 



the same support value, will be in the same partition (Fig- 
ure 10). Secondly, partitions are combined from the small- 
est support values (resp. longest FCIs) to the largest ones 
(resp. shortest FCIs). New FCIs will be added into the right 
partition in FCIs new . By using this approach, it is guar- 
anteed that the first time there is ci A ci' = 7, Sized) > e 
then ci U ci' is a new FCI because they are the two longest 
FCIs which contain 0(7). Therefore, we just ignore the later 
combinations which return 7 as the result. Furthermore, to 
ensure that 7 already exists in FCIs ne w or not, we only 
need to check items in the FCIs new partition whose sup- 
port value is equal to Sized)- We can consider that by 
partitioning FCIsdb, FCIs db > and FCIs ne w, the process 
is much improved. Additionally, we also propose a pruning 
rule to speed up the approach by ending the combination 
running of a FCI ci' as follows: 

Lemma 4. The combination running of ci' is stopped if: 
3ci £ FCIsdb s.t. ci A ci' = ci' , ci U ci' is a FCI. (10) 



Proof. Assume that 3T : T £ FCIsdb, a(T) > 
a(ci),Y A ci' = ci' . If O(ci) C O(Y) then we have 
ci £ FCIsdb, O(ci') C O(ci) C O(Y) and this violates the 
condition 2 in Property 7, therefore T U ci' is not a FCI. If 
O(ci) <£ O(T) then 3ieC DB s.t. O(ci') C i and z £ T. Fur- 
thermore, 3p : p = {\/i\i £ Cdb, O(ci') C z, z ^ T} U T. So, 
V2,2 £ Cdb, O(ci') C z then z £ p and therefore p is a FCI 
and p £ FCIsdb- Additionally, O(ci') C 0(p) C O(T). 
This violates the condition 2 in Property 7, therefore T U ci' 
is not a FCI. Consequently, we can conclude that $T s.t. 
T £ FCIsdb, cr(T) > a(ci),T A ci' = ci' and T U ci' is a 
FCI. Therefore, we do not need to continue the combination 
running of ci' . □ 

Similar to lemma 4, in the explicit combination process, ci 
will be disactivated for further combinations when there is a 
ci' so that ciAci' = ci and ciUci' is a FCI. After generating 
all new FCIs in FCIs ne w, the final results FCIs D budb' is 
created by collecting FCIs in FCIsdb, FCIsdb' 5 FCIs new - 
In this step, some of them will be discarded such that: 



Note that during the explicit combination step, the FCIs 
which will not be selected for the final results are re- 
moved by applying the conditions in Property 8. It means 
that we only add all suitable FCIs into FCIs D bvjdb' and 
therefore it is optimized and much less costly. In the 
worst case scenario, the complexity of explicit combina- 
tion of pairs of FCIs step is 0(\FCIsdb\ x \FCIs DB '\ x 
# P artu£nI(FCis new ) )- Naturally, T DB , is much smaller 
than Tdb and therefore FCIs DB > , FCIs new are very small 
compare to FCIsdb- Consequently, the process can be 
potentially greatly improved when compared to execut- 
ing the Incremental GeT_Move on the entire database 
(Odb, Tdbudb')- 

The pseudo code of Explicit Combination of Pairs of 
FCIs-based Spatio- Temporal Pattern Mining Algorithm is 
described in Algorithm 4- 



Algorithm 4: Explicit Combination of Pairs of 
FCIs-based Spatio- Temp oral Pattern Mining Al- 
gorithm 

Input : a set of FCIs FCIsdb, Occurrence sets K, int 
e, int mint, double 0, set of Occurrence sets 
(blocks) B' , int min c , double min we % 

begin 

FCIs DB , := (fr;FCIs new := 0; FCIs DBuDB , := 0; 
FCIs DB r \— Incremental GeT_Move* 
(K, £, mint, CI, 0, B' , min c , min we i); 
foreach partition P' £ FCIs DB > do 
foreach FCI ci' £ P' do 

foreach partition P £ FCIsdb do 
foreach FCI ci £ P do 
7 := ci A ci' \ 
if Size(j) > e and 
FCIs ne w-TiotContain(j, Size(j)) 
then 

7 := ci U ci'; 

FCIsnew.add(j, Sized)); 
if 7 = ci then 

I FCIsDB-remove(ci); 
if 7 = ci' then 

FCIs d b .remove(ci'); 
go to line 5; 

FCIs DBuDB > := FCIs DB U FCIs DB > U FCIs new ; 
foreach FCI A £ FCIs DB vjdb> do 
I PatternMining(A, min t ); /*X is a pattern?*/ 
19 Where: Incremental GeT_Move* is an Incremental 
GeT_Move without PatternMining sub-function, 
FC I s new . notC ontaind, Size{^)) returns true if there 
does not exists 7 in partition which has the support 
value is Sized). 
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Property 8. Discarded FCIs in FCIs n budb' creating 
step. All the FCIs which satisfy the following conditions will 
not be selected as a FCIs in the final results. 

' (1) : Wei £ FCIsdb, if 3d' £ FCIs DB ' s.t. 

ci A ci' = ci then ci will not be selected. , . 

' (2) : Wei' £ FCIs DB >, if 3d £ FCIsdb s.t. ^ } 
ci A ci' = ci' then ci' will not be selected. 



5. EXPERIMENTAL RESULTS 

A comprehensive performance study has been conducted 
on real datasets and synthetic datasets. All the algorithms 
are implemented in C++, and all the experiments are car- 
ried out on a 2.8GHz Intel Core i7 system with 4GB Memory. 
The system runs Ubuntu 11.10 and g++ version 4.6.1. 

The implementation of our proposed algorithms mining is 
also integrated in our demonstration system and it is public 



online 4 . As in [6], the two following datasets 5 have been 
used during experiments: Swainsoni dataset includes 43 ob- 
jects evolving over time and 764 different timestamps. The 
dataset was generated from July 1995 to June 1998. Buffalo 
dataset concerns 165 buffalos and the tracking time from 
year 2000 to year 2006. The original data has 26610 re- 
ported locations and 3001 timestamps. 

Similar to [6, 3, 16], we first use linear interpolation to 
fill in the missing data. For study purposes, we needed the 
objects to stay together for at least mint timestamps. As 
[6, 3, 16], DBScan [5] (MinPts = 2,Eps = 0.001) is applied 
to generate clusters at each timestamp. 

5.1 Effectiveness 

We proved that mining spatio-temporal patterns can be 
similarly mapped into itemsets mining issue. Therefore, in 
theoretical way, our approaches can provides the correct 
results. Experimentally, we do a further comparison, we 
first obtain the spatio-temporal patterns by employing tra- 
ditional algorithms such as, CMC,CuTS* 6 (convoy min- 
ing), ObjectGrowth (closed swarm mining) as well as our 
approaches. To apply our algorithms, we split cluster ma- 
trix into blocks such as each block b contains 25 consecutive 
timestamps. Additionally, to retrieve all the spatio-temporal 
patterns, in the reported experiments, the default value of 
e is set to 2 (two objects can form a pattern), mint is 1. 
Note that the default values are the hardest conditions for 
examining the algorithms. Then in the following we mainly 
focus on different values of mint in order to obtain different 
sets of convoys, closed swarms and group patterns. Note 
that for group patterns, min c is 1 and min we % is 0. 

The results show that our proposed approaches obtain the 
same results compared to the traditional algorithms. An ex- 
ample of patterns is illustrated in Figure 11. For instance, 
see Figure 11a, a closed swarm is discovered within a fre- 
quent closed itemset. Furthermore, from the itemset, a con- 
voy and a group pattern are also extracted (i.e. Figure lib, 
11c). 

5.2 Efficiency 

5.2.7 Incremental GeT_Move and GeT_Move Effi- 
ciency 

To show the efficiency of our algorithms, we also generate 
larger synthetic datasets using Brinkhoff's network 7 -based 
generator of moving objects as in [6]. We generate 500 ob- 
jects (\Odb \ — 500) for 10 4 timestamps (\Tdb \ — 10 4 ) using 
the generator's default map with low moving speed (250). 
There are 5 x 10 6 points in total. DBScan {MinPts — 
3, Eps = 300) is applied to obtain clusters for each times- 
tamp. 

In the efficiency comparison, we employ CMC, CuTS* 
and ObjectGrowth. Note that, in [6], ObjectGrowth outper- 
forms VG— Growth [21] (a group patterns mining algorithm) 
in terms of performance and therefore we will only consider 
ObjectGrowth and not both. Note that GeT_Move and 
Incremental GeT_Move extracted closed swarms, convoys 

4 www . lirmm . fr / ^phan / index . j sp 
5 http:/ /www. movebank.org 

6 The source code of CMC, CuTS* is available at 
http : / /lsirpeople .epfl . ch/j eung / sourcecodes . htm 
7 http : / / iapg. j ade-hs .de / personen /brinkhoff / generator / 



and group patterns meanwhile CMC, CuTS* only extracted 
convoys and ObjectGrowth extracted closed swarms. 

Efficiency w.r.t. e,min t . Figure 12a, 13a, 14a show 
running time w.r.t. e. It is clear that our approaches outper- 
form other algorithms. ObjectGrowth is the lowest one and 
the main reason is that with low mint (default mint — 1), 
the Apriori Pruning rule (the most efficient pruning rule) is 
no longer effective. Therefore, the search space is greatly en- 
larged (2'° DB ' in the worst case). Additionally, there is no 
pruning rule for e and therefore the change of e does not di- 
rectly affect the running time of ObjectGrowth. A little bit 
further, GeT_Move is lower than Incremental GeT_Move. 




(a) One of discovered closed swarms. 




(b) One of discovered convoys. 
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(c) One of discovered group patterns. 



Figure 11: An example of patterns discovered from 
Swainsoni dataset. 
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Figure 12: Running time on Swainsoni Dataset. 
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Figure 13: Running time on Buffalo Dataset. 
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Figure 14: Running time on Synthetic Dataset. 
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Figure 15: Number of patterns on Swainsoni Dataset. Note that # of frequent closed itemsets is equal to # 
of closed swarms. 




(a) # of patterns w.r.t. e (b) # of patterns w.r.t. mint (c) # of patterns w.r.t. \Odb\ (d) # of patterns w.r.t. |Td#| 



Figure 16: Number of patterns on Buffalo Dataset. Note that # of frequent closed itemsets is equal to # of 
closed swarms. 
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Figure 17: Number of patterns on Synthetic Dataset. Note that of frequent closed itemsets is equal to # 
of closed swarms. 



The main reason is that GeT_Move has to proccess with 
large number of items and long itemsets. Meanwhile, thanks 
to blocks, the number of items is greatly reduced and item- 
sets are not long as the ones in GeT_Move. 

Figure 12b, 13b, 14b show running time w.r.t. mint. 
In almost all cases, our approaches outperform other al- 
gorithms. See Figure 13b, 14b, with low mint, our algo- 
rithm is much faster than the others. However, when mint 
is higher (mint > 200 in Figure 13b, mint > 20 in Fig- 
ure 14b) our algorithms take more time than CuTS* and 
Object Growth. This is because with high value of mint, 
the number of patterns is significantly reduced (Figure 15b, 
16b, 17b) (i.e. no extracted convoy when mint > 100 (resp. 
mint > 200, min t > 10), Figure 15b (resp. 16b, 17b)) and 
therefore CuTS* and ObjectGrowth is faster. Meanwhile, 
GeT_Move and Incremental GeT_Move have to work with 
frequent closed itemsets. 

Efficiency w.r.t. \Odb\, \Tdb\- Figure 12c-d, Figure 
13c-d, Figure 14c-d show the running time when varying 
\Odb\ and \Tdb\ respectively. In all figures, Incremental 
GeT_Move outperforms other algorithms. However, with 
synthetic data (Figure 14d) and lowest values of e = 2 and 
mint — 1, GeT_Move is a little bit faster than Incremental 
GeT_Move. This is the clue to the fact that Incremental 
GeT_Move does not have any information to obtain the bet- 
ter partitions (blocks). 

Scalability w.r.t. e. We can consider that the run- 
ning time of algorithms does not change significantly when 
varied mint, \Odb |, \Tdb | in synthetic data (Figures 14). 
However, they are quite different when varying e (default 
mint = 1)- Therefore, we generate another large synthetic 
data to test the scalability of algorithms on e. The dataset 
includes 50,000 objects moving during 10,000 timestamps 
and it contains 500 million locations in total. The execu- 
tions of CMC and CuTS* stop due to a lack of memory 
capacity after processing 300 milion locations. Additionally, 
ObjectGrowth can not provide the results after lday run- 
ning. The main reason is that with low mint (= 1), the 
search space is significant larger (« 2 50 ' 000 ). Meanwhile, 
thanks to the LCM approach, our algorithms can provide 
the results within hours (Figure 18). 

Efficiency w.r.t. Block-size. To investigate the opti- 
mal value of block-size, we examine Incremental GeT_Move 
by using the default values of e, min t with different block- 
size values on real datasets and synthetic dataset (|Odb| = 
500, 1 Tdb I = 1,000). The optimal block-size range can 
be from 20 to 30 timestamps within which Incremental 
GeT_Move obtains the best performance for all the datasets 
(Figure 19). The main reason is that objects tend to move 
together in suitable short interval (from 20 to 30 times- 




£ 



Figure 18: Running time w.r.t e on large Synthetic 
Dataset. 
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Figure 19: Running time w.r.t block size. 

tamps). Therefore, by setting block-size in this range, the 
data is efficiently compressed into FCIs. Meanwhile, with 
larger block-size values, the objects' movements are quite 
different; therefore, the data compressing is not so efficient. 
Regarding to small block-size values (5-15), we have to face 
up to a large number of blocks so that the process is slowed 
down. In the previous experiments, block-size is set to 25. 

5.2.2 Parameter Free Incremental GeT_Move Effi- 
ciency 

The experimental results show that, so far, Incremen- 
tal GeT_Move and GeT_Move outperform other algorithms. 
Additionally, our algorithms can work with low e and mint 
values. In this section, we perform another experiment to 
examine the efficiency of the Parameter Free Incremental 
GeT_Move algorithm. In this experiment, we compare per- 
formances of four algorithms: 1) Parameter free Incremen- 
tal GeT_Move, named Nested Incremental GeT_Move, 2) 
Nested GeT_Move which is the application of GeT_Move 
on nested cluster matrix CMn, 3) Incremental GeT_Move 
which is executed with the optimal block size values on orig- 
inal cluster matrix CM, 4) GeT_Move which is applied on 
original cluster matrix CM. 



Efficient w.r.t. Real datasets. Figure 21, 22 show that 
Nested Incremental GeT_Move (resp. Parameter Free In- 
cremental GeT_Move) greatly outperforms the other algo- 
rithms. It is due to the better performance of LCM algo- 
rithm on nested cluster matrix (resp. fully nested blocks) 
compared to the original cluster matrix. Essentially, with 
the nested cluster matrix, the number of combinations of 
closed itemsets X and items i to ensure the closeness is 
greatly reduced. Therefore, the performance of the LCM al- 
gorithm is greatly improved. The fact is Nested GeT_Move 
is always better than GeT_Move (Figure 21, 22, 23). Addi- 
tionally, the Swainsoni and Buffalo datasets contain many 
fully nested blocks (Table 6 and Figure 20). Consequently, 
the Nested Incremental GeT_Move is more efficient than the 
other algorithms. 

Efficient w.r.t. Synthetic dataset. We can consider that 
Nested Incremental GeT_Move is quite similar to Nested 
GeT_Move (Figure 23). The main reasons are that: 1) 
Synthetic data is very sparse, 2) there are few fully nested 
blocks, 3) the nested blocks contain a very small number 
of items (i.e. 0.1% matrix fill by '1' and only 8 fully nested 
blocks which average length is 2, see Table 6 and Figure 20e- 
f). Therefore, the processing time of nested blocks is quite 
short. Meanwhile, there is a large nested sparse block which 
is the main partition that need to be processed by both 
Nested Incremental GeT_Move and Nested GeT_Move. 

Additionally, thanks to the nested sparse block, the per- 
formance of LCM is improved a lot. Therefore, Nested 
Incremental GeT_Move and Nested GeT_Move are better 
than the others in most of cases. Exceptionally, with small 
number of objects \Odb\ (i.e. \Odb\ = 50, Figure 23c) 
or high e (i.e. e > 9, Figure 23a), Incremental GeT_Move 
is slightly better than Nested Incremental GeT_Move and 
Nested GeT_Move. The main reason is that Incremental 
GeT_Move splits the cluster matrix CM into different small 
blocks within which there are a small number of items and 
FCIs which means that the computation cost is reduced. On 
the other hand, Nested Incremental GeT_Move and Nested 
GeT_Move need to work with a large nested sparse block. 

5.2.3 Spatio-Temporal Pattern Mining Algorithm 
Based on Explicit Combination of FCI Pairs 

In this section, an experiment is designed to examine the 
spatio-temporal pattern mining algorithm based on explicit 
combination of FCI pairs and to identify when we should 
update the database. We first use half of Swainsoni, Buffalo 
and Synthetic datasets as a DB. Then the other half is used 
to generate DB' which is increased step by step up to the 
maximum size (Figure 24). In this experiment, Incremental 
GeT_Move is employed to extract FCIs from DB and DB' . 

For real datasets (Swainsoni and Buffalo), the explicit 
combination algorithm is more efficient than the Incremental 
GeT_Move in all cases (Figure 24a, b). This is because we 
already have FCIsdb and therefore we only need to extract 
FCIs DB > and then combine FCIsdb and FCIs DB > ■ Addi- 
tionally, the Swainsoni and Buffalo are sufficiently dense (i.e. 
17.8% and 7.2% with large number of fully nested blocks, 
see Table 6) so that the numbers of FCIs in FCIsdb and 
FCIs DB > are not huge. Consequently, the number of combi- 
nations is reduced and thus the algorithm is more efficient. 
In Figure 24a, b, we can consider that the running time 
of the explicit combination algorithm significantly changes 
when \T DB >\ > 15%\Tdb\> This means that it is better to 



Table 6: Fully nested blocks on datasets. 



Dataset 


Matrix fill 


#Nested blocks 


avg. length 


Swainsoni 


17.8% 


102 


4.52 


Buffalo 


7.2% 


602 


2.894 


Synthetic 


0.1% 


8 


2.00 



update the database when \T DB / \ < 15%\Tdb\- 

For the synthetic dataset, the explicit combination al- 
gorithm is only efficient on small DB' (i.e. \T DB /\ < 
20%\Tdb I, Figure 24c) because the dataset is very sparse. 
In fact, the number of FCIs in FCIs DB > is enlarged when 
the size of DB' increases. Thus, the explicit combination al- 
gorithm is not efficient because of the huge number of com- 
binations. 

Overall, we can consider that the explicit combination al- 
gorithm obtains good efficiency when T BB / is smaller than 
15%ofT M . 

To summarize, Incremental GeT_Move and GeT_Move 
outperform the other algorithms. Additionally, our algo- 
rithms can work with low values of e and mint. To reach 
optimal efficiency, we propose a parameter free Incremental 
GeT_Move (resp. Nested Incremental GeT_Move) which dy- 
namically assigns fully nested blocks for the algorithm from 
the nested cluster matrix. The experimental results show 
that the efficiency is greatly improved with the Nested In- 
cremental GeT_Move and Nested GeT_Move. Furthermore, 
by storing FCIs in a closed itemset database (see Figure 7) , 
it is possible to reuse them whenever new object movements 
arrive. The experimental results show that it is better to 
update the database when T DB ' is smaller than 15% of Tdb 
by applying the explicit combination algorithm. 

6. CONCLUSION AND DISCUSSION 

In this paper, we propose a (parameter free) unify- 
ing incremental approaches to automatically extract differ- 
ent kinds of spatio-temporal patterns by applying frequent 
closed itemset mining techniques. Their effectiveness and 
efficiency have been evaluated by using real and synthetic 
datasets. Experiments show that our approaches outper- 
form traditional ones. 

Another issue we plan to address is how to take into ac- 
count the arrival of new objects which were not available for 
the first extraction. Now, as we have seen, we can store the 
results (resp. FCIs) to improve the process when new object 
movements arrive. In this approach we take the hypothesis 
is that the number of objects remains the same. However in 
some applications these objects could be different. 

7. REFERENCES 

[1] Gudmundsson J, van Kreveld M. Computing longest 

duration flocks in trajectory data. In: GIS 06, New York, 

NY, USA, pp.35-42. 
[2] Vieira MR, Bakalov P, Tsotras VJ. On-line Discovery 

of Flock Patterns in Spatio-Temporal Data. In: GIS 09, 

New York, NY, USA, pp.286-295. 
[3] Jeung H, Yiu ML, Zhou X, Jensen CS, Shen HT. 

Discovery of Convoys in Trajectory Databases. PVLDB 

2008, 1(1):1068-1080. 
[4] P. Kalnis, N. Mamoulis, S. Bakiras. On Discovering 

Moving Clusters in Spatio-temporal Data. In SSTD 2005, 

Angra dos Reis, Brazil, pages 364-381. 




(a) Running time w.r.t. e (b) Running time w.r.t. mint (c) Running time w.r.t. \Odb\ (d) Running time w.r.t. \Tdb\ 



Figure 21: Running time on Swainsoni Dataset. 
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Figure 22: Running time on Buffalo Dataset. 
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Figure 23: Running time on Synthetic Dataset. 
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Figure 24: Explicit combinations algorithm efficiency. 
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